TOP 50 Cryptocurrencies Historical Prices¶

Nikulin Maxim DSBA 243-2

Introduction: The cryptocurrency market has evolved dramatically in recent years, with the total market capitalization reaching $3.46 trillion as of 2024 . This analysis focuses on examining the historical price movements and market trends of the top 50 cryptocurrencies, providing valuable insights into the digital asset ecosystem's development and dynamics.

The data of this dataset can be found by following link: https://www.kaggle.com/datasets/odins0n/top-50-cryptocurrency-historical-prices?resource=download

Dataset description:

  • Date: Date of observation
  • Price: Price on the given day (Also the closing price for that day)
  • Open: Opening price on the given day
  • High: Highest price on the given day
  • Low: Lowest price on the given day
  • Volume: Volume of transactions on the given day
  • Change%: Percentage Change from the previous day

Main part:

Importing main libraries to our project:

In [15]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.io as pio
import plotly.graph_objects as go
import plotly.io as pio

This is the heading of the dataset:

In [3]:
df = pd.read_csv(r'C:\Users\m4xni\Desktop\Проекты\HSE_project\Nikulin_243_2\Aave.csv')
print(df.head(10))
   SNo        Date  Price  Open  High   Low       Vol.  Change %
0    1  2018-01-30   0.15  0.17  0.17  0.14   530470.0     -7.95
1    2  2018-01-31   0.14  0.15  0.15  0.13   396050.0    -11.10
2    3  2018-02-01   0.11  0.14  0.14  0.11   987260.0    -17.46
3    4  2018-02-02   0.10  0.11  0.11  0.08  1810000.0     -8.32
4    5  2018-02-03   0.11  0.10  0.12  0.09  1200000.0      6.85
5    6  2018-02-04   0.09  0.11  0.12  0.09  1040000.0    -18.16
6    7  2018-02-05   0.07  0.09  0.09  0.06   756000.0    -24.39
7    8  2018-02-06   0.09  0.07  0.09  0.05   819460.0     26.28
8    9  2018-02-07   0.08  0.09  0.09  0.07   890850.0    -10.06
9   10  2018-02-08   0.09  0.08  0.09  0.08   211470.0     15.81

1. In our project all columns are numeric. So, let's find:

  • Medival values,
  • Avarage values,
  • Standard deviation of fields.
In [4]:
print("\033[1mMedian values:\033[0m")
print((df[['Price', 'Open', 'High', 'Low', 'Vol.', 'Change %']].median()).round(2))
print()

print("\033[1mMean values:\033[0m")
print((df[['Price', 'Open', 'High', 'Low', 'Vol.', 'Change %']].mean()).round(2))
print()

print("\033[1mStandart devitation values:\033[0m")
print((df[['Price', 'Open', 'High', 'Low', 'Vol.', 'Change %']].std()).round(2))
print()
Median values:
Price            0.03
Open             0.03
High             0.03
Low              0.03
Vol.        310590.00
Change %         0.00
dtype: float64

Mean values:
Price           67.05
Open            66.74
High            71.13
Low             62.49
Vol.        674127.40
Change %         5.45
dtype: float64

Standart devitation values:
Price           139.96
Open            139.63
High            148.90
Low             130.72
Vol.        1077260.55
Change %        176.11
dtype: float64

2. Let's check rows with NaN values:

In [5]:
print(df.describe())
               SNo        Price         Open         High          Low  \
count  1275.000000  1275.000000  1275.000000  1275.000000  1275.000000   
mean    638.000000    67.045906    66.742322    71.129875    62.490110   
std     368.205106   139.960408   139.634444   148.895685   130.723039   
min       1.000000     0.000000     0.000000     0.000000     0.000000   
25%     319.500000     0.010000     0.010000     0.010000     0.010000   
50%     638.000000     0.030000     0.030000     0.030000     0.030000   
75%     956.500000     0.580000     0.580000     0.620000     0.540000   
max    1275.000000   629.380000   629.380000   665.180000   564.850000   

               Vol.     Change %  
count  1.275000e+03  1275.000000  
mean   6.741274e+05     5.454431  
std    1.077261e+06   176.107560  
min    0.000000e+00   -38.080000  
25%    5.339500e+04     0.000000  
50%    3.105900e+05     0.000000  
75%    8.440000e+05     0.000000  
max    1.050000e+07  6284.530000  
In [6]:
df = df.drop_duplicates()

We see that all types are correct and no null values were found, so data is clean

3. Now we can create some graphs, which are based on our dataset

For the easiest way to analyze data I prefer to take first 50 elements

In [9]:
df['Date'] = pd.to_datetime(df['Date'])
df_50 = df.head(50).copy()
df_50['Timestamp'] = df_50['Date'].map(pd.Timestamp.timestamp)
x_50 = df_50['Timestamp']
y_50 = df_50['Price']
coefficients_50 = np.polyfit(x_50, y_50, 1)
trend_line_50 = np.poly1d(coefficients_50)
fig = px.line(
    df_50,
    x='Date',
    y='Price',
    labels={'Date': 'Date', 'Price': 'Price (USD)'}
)
fig.add_scatter(
    x=df_50['Date'],
    y=trend_line_50(x_50),
    mode='lines',
    name='Trend Line',
    line=dict(color='red', dash='dash')
)
fig.add_scatter(
    x=df_50['Date'],
    y=df_50['High'],
    mode='markers',
    name='High',
    marker=dict(color='green', size=8)
)
fig.add_scatter(
    x=df_50['Date'],
    y=df_50['Low'],
    mode='markers',
    name='Low',
    marker=dict(color='red', size=8)
)
fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Price (USD)',
    template='plotly_white',
    title_x=0.5,
    width=1000,
    height=600  
)
fig.show()

Analyzing this graph it can be seen that from 2018-02-01 to 2018-03-22 there was a decrease in prices of Krypto currences. Also there is a scatter diagram which shows higest price(green) and the lowest price(red) in particular data

Now, let's create bar chart

In [30]:
df['Date'] = pd.to_datetime(df['Date'])
df_50 = df.head(50).copy()
fig = px.bar(
    df_50,
    x='Date',
    y=['Price', 'High', 'Low'],
    labels={'Date': 'Date', 'value': 'Price (USD)', 'variable': 'Metrics'},
    barmode='group'
)
fig.update_layout(
    title = 'Bar chart',
    xaxis_title='Date',
    yaxis_title='Price (USD)',
    template='plotly_white',
    title_x=0.5,
    width=1000,
    height=600
)
fig.show()

By looking at the chart, you can identify patterns or trends, such as whether the price remains stable or fluctuates significantly between High and Low values over the selected timeframe. For any given date, the High bar will always be above or equal to Price, and Low will be below or equal to Price.

In [28]:
df['Date'] = pd.to_datetime(df['Date'])
df_50 = df.head(50).copy()
fig = go.Figure()
fig.add_trace(go.Candlestick(
    x=df_50['Date'],  # Dates for the x-axis
    open=df_50['Change %'],  # Change as opening values
    high=df_50['Change %'] + df_50['Vol.'],  # Change + Vol. as high
    low=df_50['Change %'] - df_50['Vol.'],  # Change - Vol. as low
    close=df_50['Change %'],  # Change as closing values
    increasing_line_color='green',  # Color for increasing candles
    decreasing_line_color='red',  # Color for decreasing candles
    name='Candlestick (Vol. & Change)'
))
fig.update_layout(
    title = 'Candlestick chart',
    xaxis_title='Date',
    yaxis_title='Change & Volume',
    template='plotly_white',
    title_x=0.5,
    width=1000,
    height=600
)
fig.show()

X-Axis (Dates): The horizontal axis shows the dates, representing the time period covered by the first 50 data points in chronological order. This allows tracking changes and volumes over specific time intervals.

Y-Axis (Change & Volume): The vertical axis represents the Change and its fluctuation influenced by Vol. Positive and negative values of Change are visualized, with Vol. determining the range for each candlestick. Candlestick Components:

Open & Close (Change): The candlestick's body represents the Change value during the specified period.

High & Low (Vol.): The wicks (lines above and below the body) show the maximum and minimum values of Change, calculated using Vol.: High = Change + Vol. Low = Change - Vol.

Colors: Green Candlesticks: Represent an increase or no change during the time period. Red Candlesticks: Indicate a decrease in value during the time period.

Insights: This chart provides a visual representation of the volatility in Change influenced by the Vol. parameter. It can help identify periods of high fluctuation (large candlesticks) or stability (small candlesticks).

In [29]:
df['Date'] = pd.to_datetime(df['Date'])
df_50 = df.head(50).copy()
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=df_50['Date'], 
    y=df_50['High'], 
    mode='lines',
    line=dict(color='green', width=1),
    name='High'
))
fig.add_trace(go.Scatter(
    x=df_50['Date'], 
    y=df_50['Low'], 
    mode='lines',
    line=dict(color='red', width=1),
    name='Low'
))
fig.add_trace(go.Scatter(
    x=df_50['Date'], 
    y=df_50['Open'], 
    mode='markers',
    marker=dict(color='blue', size=6),
    name='Open'
))
fig.update_layout(
    title='OHLC Chart',
    xaxis_title='Date',
    yaxis_title='Price (USD)',
    template='plotly_white',
    title_x=0.5,
    width=1000,
    height=600
)
fig.show()

The green line represents the highest price achieved during each time interval, while the red line shows the lowest price. The blue markers indicate the starting prices, providing a point of reference for each interval's price movements.

Periods with larger gaps between the green and red lines indicate high volatility, reflecting significant fluctuations in price during those intervals. Conversely, narrower gaps suggest stability with minimal price movement. The blue markers often align closer to either the green or red lines, reflecting whether the price trend was predominantly upward or downward.

This visualization captures short-term price behavior, showing trends of increase or decrease over time. Steep divergences between high and low values may indicate impactful market events or increased trading activity, while closely aligned lines suggest quieter market conditions. The chart serves as a concise tool for identifying trends and assessing market volatility within the selected period.

In [42]:
x_col = 'High' 
y_col = 'Low'
z_col = 'Change %'
data_50 = df.head(50)
# Group by to aggregate duplicates (if any)
data_grouped = data_50.groupby([y_col, x_col], as_index=False)[z_col].mean()
heatmap_data = data_grouped.pivot(index=y_col, columns=x_col, values=z_col)
fig = px.imshow(
    heatmap_data,
    color_continuous_scale="RdBu",
    zmin=heatmap_data.min().min(),
    zmax=heatmap_data.max().max(),
    title="Heatmap",
    labels={"color": "Intensity"}
)
fig.update_layout(
    width=1200, 
    height=800,
)
fig.show()
In [ ]: